Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis
نویسندگان
چکیده
We study the performance of Machine Learning (ML) classification techniques. Leveraging theory large deviations, we provide mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say $\exp (-n\,I)$ , where notation="LaTeX">$n$ is number informative observations available testing (or another relevant parameter, such as size target in an image) and notation="LaTeX">$I$ rate. Such depend on Fenchel-Legendre transform cumulant-generating function Data-Driven Decision Function (D3F, i.e., what thresholded before final binary decision made) learned training phase. As such, D3F related rate given set. The exponential convergence can be verified tested numerically exploiting dataset or synthetic generated according underlying statistical model. Coherently with deviations theory, also establish normalized statistic Gaussian distribution. Furthermore, approximate probability curves notation="LaTeX">$\zeta _{n} \exp are provided, thanks refined asymptotic derivation, _{n}$ represents most representative sub-exponential terms probabilities. asymptotic, able compute accurate analytical approximation both regimes small values . Theoretical findings corroborated by extensive numerical simulations use real-world data, acquired X-band maritime radar system surveillance.
منابع مشابه
On Concentration and Revisited Large Deviations Analysis of Binary Hypothesis Testing
This paper first introduces a refined version of the Azuma-Hoeffding inequality for discrete-parameter martingales with uniformly bounded jumps. The refined inequality is used to revisit the large deviations analysis of binary hypothesis testing.
متن کاملBayesian Hypothesis Testing in Machine Learning
Most hypothesis testing in machine learning is done using the frequentist null-hypothesis significance test, which has severe drawbacks. We review recent Bayesian tests which overcome the drawbacks of the frequentist ones.
متن کاملStatistical Analysis based Hypothesis Testing Method in Biological Knowledge Discovery
The correlation and interactions among different biological entities comprise the biological system. Although already revealed interactions contribute to the understanding of different existing systems, researchers face many questions everyday regarding inter-relationships among entities. Their queries have potential role in exploring new relations which may open up a new area of investigation....
متن کاملCombining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under inv...
متن کاملAnalysis of Statistical Hypothesis based Learning Mechanism for Faster Crawling
The growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a lot of queries from various parts of this world, and the answers of it solely depend on the knowledge that it gathers by means of crawling. The information s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE open journal of signal processing
سال: 2022
ISSN: ['2644-1322']
DOI: https://doi.org/10.1109/ojsp.2022.3232284